Effective ranking with arbitrary passages

نویسندگان

Marcin Kaszkiel

Justin Zobel

چکیده

Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. In this paper, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TREC 7 Ad Hoc, Speech, and Interactive tracks at MDS/CSIRO

1 Overview For the 1998 round of TREC, the MDS group, long-term participants at the conference, jointly participated with newcomers CSIRO. Together we completed runs in three tracks: ad-hoc, interactive, and speech. 2 Ad-hoc task In TREC-5 we used document retrieval based on arbitrary passages 8, 9], or xed-length passages that could start at any word position. Although far from the best runs i...

متن کامل

A Phased Ranking Model for Information Systems

To effectively sort and present relevant information pieces (e.g., answers, passages, documents) to human users, information systems rely on ranking models. Existing ranking models are typically designed for a specific task and therefore are not effective for complex information systems that require component changes or domain adaptations. For example, in the final stage of question answering, ...

متن کامل

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which...

متن کامل

Boosting weak ranking functions to enhance passage retrieval for Question Answering

We investigate the problem of passage retrieval for Question Answering (QA) systems. We adopt a machine learning approach and apply to QA a boosting algorithm initially proposed for ranking a set of objects by combining baseline ranking functions. The system operates in two steps. For a given question, it first retrieves passages using a conventional search engine and assigns each passage a ser...

متن کامل

Trec 7 Ad Hoc, Speech, and Interactive Tracks at Mdsscsiro 2.1 System Description

In TREC-5 we used document retrieval based on arbitrary passages [8, 9], or xed-length passages that could start at any word position. Although far from the best runs in TREC5, these results were promising, in particular for long documents. In TREC-6 we continued with arbitrary passages, but our main emphasis was on comprehensive factor analysis of successful automatic query expansion and re ne...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

JASIST

دوره 52 شماره

صفحات -

تاریخ انتشار 2001

Effective ranking with arbitrary passages

نویسندگان

چکیده

منابع مشابه

TREC 7 Ad Hoc, Speech, and Interactive tracks at MDS/CSIRO

A Phased Ranking Model for Information Systems

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Boosting weak ranking functions to enhance passage retrieval for Question Answering

Trec 7 Ad Hoc, Speech, and Interactive Tracks at Mdsscsiro 2.1 System Description

عنوان ژورنال:

اشتراک گذاری